Improving MT coherence through text-level processing of input texts: the COMTIS project

نویسندگان

  • Andrei Popescu-Belis
  • Bruno Cartoni
  • Andrea Gesmundo
  • James Henderson
  • Cristina Grisot
  • Paola Merlo
  • Thomas Meyer
  • Jacques Moeschler
  • Sandrine Zufferey
چکیده

This paper presents an ongoing research project, started in March 2010 and sponsored by the Swiss National Science Foundation, which aims at improving machine translation output in terms of textual coherence. Coherence in text is mainly due to inter-sentential dependencies. Statistical Machine Translation (SMT) systems, currently sentence-based, often fail to translate these dependencies correctly. Within the COMTIS project, state-ofthe-art linguistics research and Natural Language Processing (NLP) techniques are combined to identify and to label inter-sentential dependencies that can be learned by SMT system in the training phase.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Impact of Input Enrichment in Long Text vs. Short Texts on Grammatical Accuracy in Writing Among Elementary Language Learners

This study was conducted to investigate the influence of teaching accurate grammar inwriting via enriched long text and short text for the elementary students atShokouhe_Farhang institute. The homogenized subjects were divided into two groups of 18and 17 participants. Using a writing exam as a pretest in order to check the students’knowledge in English past tense. The control group received the...

متن کامل

Text Coherence in Translation

Machine translation (MT) has focused on the problems of syntax and semantics at the sentence level, but the real goal of MT is to translate texts, a fact that has been generally overlooked. There is a crucial difference between a text and a set of unrelated sentences, and in MT, one must avoid destroying the former by translating it into the latter. It is the coherence of text in particular tha...

متن کامل

Formal v. Informal: Register-Differentiated Arabic MT Evaluation in the PLATO Paradigm

Tasks performed on machine translation (MT) output are associated with input text types such as genre and topic. Predictive Linguistic Assessments of Translation Output, or PLATO, MT Evaluation (MTE) explores a predictive relationship between linguistic metrics and the information processing tasks reliably performable on output. PLATO assigns a linguistic signature, which cuts across the task-b...

متن کامل

PaTrans- A Patent Translation System

This paper describes Pa~lh'ans a fully automat ic production MT system designed for producing raw translations of patent texts fl'om English into Danish. First we describe the backbone of tile system: the E U R O T R A research project, and prototype. Then we give an overview of the trauslat, ion process and the basic flmetionality of Pa'I~'ans, and finally we describe some recent extensions fo...

متن کامل

A Proposal for a Coherence Corpus in Machine Translation

Coherence in Machine Translation (MT) has received little attention to date. One of the main issues we face in work in this area is the lack of labelled data. While coherent (human authored) texts are abundant and incoherent texts could be taken from MT output, the latter also contains other errors which are not specifically related to coherence. This makes it difficult to identify and quantify...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012